Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

DCP-NAS: Discrepant Child-Parent Neural Architecture Search for 1-Bit CNNs

115

TABLE 4.2

Eﬀect of with/without the reconstruction error and the

tangent direction constraint on the ImageNet data set. The

architecture used for the experiments is DCP-NAS-L.

Tangent direction (D(ˆα))

Reconstruction error (LR( ˆw, β))

Accuracy

Top-1

66.7

68.3

68.2

72.4

Top-5

83.3

85.0

85.1

89.2

used for both parent and child models. When applied to the Child model, the w here denotes

the reconstructed weights from the binarized weights, that is,, w = β ◦b ^ˆ^w.

4.4.7

Ablation Study

Eﬀectiveness of Tangent Propagation In this section, we evaluate the eﬀects of the

tangent propagation on the performance of DCP-NAS, the hyperparameter used in this

section includes λ, μ. Furthermore, we also discuss the eﬀectiveness of the reconstruction

error. The implementation details are given below.

For searching for a better binary neural architecture, λ and μ are used to balance the

KL divergence ^˜f( ˆw, ˆα, β) to supervise the Child, the reconstruction error for binary weights

LR( ˆw, β) and the constraint in the tangent direction D(ˆα). We evaluated λ and μ on the

ImageNet data set with the DCP-NAS-L architecture. To better understand tangent prop-

agation on the large-scale ImageNet ILSVRC12 dataset, we experimented to examine how

the tangent direction constraint aﬀects performance. Based on the experiments described

above, we ﬁrst set λ to 5e −3 and μ to 0.2 if they are used. As shown in Table 4.2, both

FIGURE 4.15

With diﬀerent λ and μ, we evaluated the Top-1 accuracies of DCP-NAS-L

on ImageNet.